Feature Reinforcement Learning: State of the Art

نویسندگان

Mayank Daswani

Peter Sunehag

Marcus Hutter

چکیده

Feature reinforcement learning was introduced five years ago as a principled and practical approach to history-based learning. This paper examines the progress since its inception. We now have both model-based and model-free cost functions, most recently extended to the function approximation setting. Our current work is geared towards playing ATARI games using imitation learning, where we use Feature RL as a feature selection method for high-dimensional domains. This paper is a brief summary of the progress so far in the Feature Reinforcement Learning framework (FRL) (Hutter 2009a), along with a small section on current research. FRL focuses on the general reinforcement learning problem where an agent interacts with an environment in cycles of action, observation-reward. The goal of the agent is to maximise an aggregation of the reward. The most traditional form of this general problem constrains the observations (and rewards) to be states which satisfy the Markov property, i.e. P (ot|o1:t−1) = P (ot|ot−1) and is called a Markov Decision Process (MDP) (Puterman 1994). A less constrained form is Partially Observable Markov Decision Processes (Kaelbling, Littman, and Cassandra 1998) when these observations are generated from some unobservable underlying Markov Decision Process. Feature Reinforcement Learning (Hutter 2009a) is one way of dealing with the general RL problem, by reducing it to an MDP. It aims to construct a map from the history of an agent, which is its action-observation-reward cycles so far, to an MDP state. Traditional RL methods can then be used on the derived MDP to form a policy (a mapping from these states to actions). FRL fits in the category of a history-based approach. U-tree (McCallum 1996) is a different example of the history-based approach which uses a tree-based representation of the value function where nodes are split based on a local criterion. The cost in FRL is global, maps are accepted or rejected based on an evaluation of the whole map. While the idea behind FRL is simple, there are several choices to be made. What space do we draw the maps from, and how do we pick the one that fits our data so far? In the best case, we’d like to choose a map φ from the space of all possible (computable) functions on histories, but this is Copyright c © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. intractable in practice and the choice of a smaller hypothesis class can encode useful knowledge and improve learning speed. We define a cost-function that ideally measures how well φ maps the process to an MDP. The problem of searching through the map class for the best map φ∗ is addressed via a stochastic search method. Taking a step back from the history-based learning problem, we can frame the general RL problem as trying to find a map from a very-high dimensional input space, namely that of all possible histories to a policy representation that allows us to perform well in the given environment. This policy representation is often in the form of a value function but it does not have to be. The model-based feature RL framework (Hutter 2009a; 2009b) tries to build an MDP space first, and then find a value function for that MDP. A model-free approach (Daswani, Sunehag, and Hutter 2013) goes straight for the value function representation without trying to build an MDP model. This approach easily extends to function approximation. Note that this representation of a general RL problem as a problem in a very-high dimensional input space allows us to use feature RL in the traditional learning setting for feature selection in function approximation problems. Instead of features of the history, our features are now that of the MDP state. The cost function now selects for the smallest subset of features that can represent our model or the valuefunction. Our current work is on using the value-based cost both in the off-policy and on-policy setting to deal with domains within the scope of the Arcade Learning Environment (Bellemare et al. 2013). The outline of this paper is as follows. Section 1 outlines some notation and relevant background, Section 2 deals with some related work, Section 3 looks at the Cost functions that have been examined in the FRL setting so far, and summarises the successes of the method. We conclude in Section 4.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

متن کامل

Web pages ranking algorithm based on reinforcement learning and user feedback

The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...

متن کامل

Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning

Recent advances in combining deep learning and Reinforcement Learning have shown a promising path for designing new control agents that can learn optimal policies for challenging control tasks. These new methods address the main limitations of conventional Reinforcement Learning methods such as customized feature engineering and small action/state space dimension requirements. In this paper, we...

متن کامل

A Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem

Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Feature Reinforcement Learning: State of the Art

نویسندگان

چکیده

منابع مشابه

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Reinforcement Learning in Neural Networks: A Survey

Reinforcement Learning in Neural Networks: A Survey

Web pages ranking algorithm based on reinforcement learning and user feedback

Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning

A Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem

عنوان ژورنال:

اشتراک گذاری